Differentiating Effective Data Mining From Fishing, Trapping, and Cruelty to Numbers

nn Differentiating Effective Data Mining From Fishing, Trapping, and Cruelty to Numbers Just Right or Too Much of a Good Thing? It is said that politicians use statistics the way that an inebriated person uses a lamppost—“for support, not illumination.” In managed care pharmacy today, some would argue that the same has become true of analyses of medical or pharmacy administrative claims data. The reasoning goes that, given a claims dataset and enough time to massage the data, one can set out to prove nearly anything and produce the desired answer. Is the accusation justified? Compared with other types of research such as randomized controlled trials or patient surveys, retrospective analyses of administrative claims present greater potential for violations of ethical research standards. With a typical database and minimal effort, it is possible (not appropriate, but possible) to recalculate study results post hoc using seemingly endless combinations of methodological decisions. Some of the opportunities to revise study results, either for manipulation or legitimate scientific inquiry, include decisions about these questions: • How many claims during what period of time constitute a drug user? • How long is the washout period to define a “new start” with the medication? • Which diagnosis codes in which positions (primary, secondary, tertiary) on how many medical claims constitute the appropriate inclusion criteria? • For how long should patients be followed and continuous eligibility be required for inclusion in the sample? • How should the researcher translate a broad concept, such as noncompliance or treatment success, into measurable decision rules? So many study design changes are possible, all at the push of a computer key. This ability to create multiple scenarios so easily has precipitated the lure of the “fishing expedition” in which repeated attempts are made to produce a particular desired finding. Unfortunately, this approach poses a substantial risk of generating incorrect information; while the resulting finding might be appealing, it might also represent nothing more than sampling error. A statistical significance standard of P <0.05 refers to a 1 in 20 probability of “Type 1” error, falsely detecting a statistically significant result when outcomes are actually due to chance. After just 10 attempts using a statistical significance standard of P <0.05, the probability of obtaining at least 1 false positive result is 40%. After 20 attempts, that probability increases to 65%. However, the very feature of claims database research that is a major source of ethical and statistical shortcomings—the ready ability to perform post hoc analysis—is also a key tool in avoiding or mitigating those shortcomings. Used properly, for legitimate scientific inquiry and not to support a predeterEditorial

mined outcome of interest, post hoc analysis facilitates acandid and thorough presentation of study findings and ultimately a moreuseful research product. How to use this tool ethically and effectively is examined here, beginning with published examples of common problems in database analysis, and then turning to publications that illustrate "best practices." nn 1. Common Problems in Claims Database Analysis

Boosting Academic Achievement With Refrigerators-Association Versus Cause and Effect
Sociologist James Coleman' s1 966 report on educational opportunity,w hich linked parental socioeconomic status to childhood academic achievement for the first time, has influenced educational policy in the United States for decades. 3 But 5y ears after the publication of Coleman' sw ork, ar eanalysis of his data showed that knowing only whether achild' shousehold contained 9c ommon items (e.g., television set, refrigerator) produced ar easonably accurate prediction of the child' sv erbal achievement (correlations of 0.72-0.80). 4 Commenting on the reanalysis, statistician Elazar Pedhazur pointed out that no one would be so foolish as to purchase the 9household items for all the underachieving children in the United States in an effort to boost their academic performance. Yet, he argued, many researchers fall into exactly the same trap when they mistake prediction-an association between two phenomena-for explanation-a cause-and-effect relationship. 4 Pedhazur' shypothetical naive researcher fails to understand that unmeasured causes, such as parental income, can influence both the purchase of household items and the achievement of children, thereby creating the appearance of ac ausal relationship where none exists.
Claims database research in managed carep harmacy is particularly vulnerable to confusion between association and causation. Studies documenting associations between suspected causal factors (e.g., ap articular diagnosis, drug, or benefit design feature) and outcomes (e.g., compliance, cost, adverse event) arec ommon, perhaps in part because they arer elatively easy to perform. While therei sn othing inherently wrong in documenting associations between events, trouble arises when researchers attribute causality to the associations, sometimes making policy recommendations based on the assumed causal mechanism, when cause-and-effect has not been shown or even explored. Tw oc ommon examples of suboptimal practice are discussed below.

Cost of Illness(es)-All of Them
It is common and appropriate for claims database studies to assess the health carec osts associated with particular disease states, often including measurement of total health carec osts. But work of this type "crosses the line" when it ascribes total health carecosts to aparticular medical condition or treatment pattern( e.g., noncompliance, medication choice, diagnosis) without examining whether the services used had any relationship to either the medication or the condition being treated. [5][6][7] One such analysis in the medical literaturec ompared payers' total (all-cause) health carec osts for health plan enrollees with those without ad iagnosis of atrial fibrillation (AFIB). Multivariate analysis and study d esign controlled for enrollee demographics, health plan, and other health conditions as measured by the Charlson Comorbidity Index (CCI). Notably, the authors removed conditions "that may have been caused by AFIB," including heart failureand stroke, from the CCI; this decision left the health carec ost analysis unadjusted for these conditions. The study found that AFIB patients had higher all-cause health carec osts and higher rates of cardiovascular comorbidities than did plan enrollees who did not have AFIB. Withoutinvestigatingtheproceduresassociatedwiththeincreased health careservice utilization, the authors described the $12,349 difference in per-person all-cause annual cost between enrollees with and without AFIB as the "direct cost burden" of AFIB. Similarly,w ithout measuring cost change associated with AFIB treatment or comparing treated versus untreated AFIB cases, the authors concluded that "the successful treatment of AFIB may ...h ave substantial health carec ost savings benefits." 7 Both conclusions wereunfounded. On the basis of the work actually performed, one could appropriately conclude only that AFIB patients had morecomorbidities and higher expenses. The degree to which AFIB caused the increased costs or comorbidities was not investigated. Since the AFIB patients had much higher rates of cardiovascular conditions, including heart failure (relative risk [RR] 29.6), other arrhythmias or conduction disorders (RR 16.9), heart attack (RR 8.5), and stroke (RR 6.6), it is not surprising that their total health carecosts werehigher. Thus, instead of being ac ausal agent producing higher costs, AFIB could have been am arker for heart disease severity,a consequence of the comorbid heart conditions such as stroke or heart attack, or acorrelate of other unmeasured factors that were driving total health carecost. The conclusion that treating AFIB might substantially reduce total health caree xpense was not supported by study findings.
As we shall see from the "best practice" examples, am uch morei nformative analysis could have been performed by sampling clinically homogenous groups and by examining the health careprocedures used by AFIB and non-AFIB enrollees to assess how much of the additional cost was actually due to AFIB treatment. Notably,t he AFIB study' sa uthors indicated that an investigation of the actual drivers of cost would "complete the cost pictureofthis condition" but described this analysis as "an area for futureresearch."

"Trapping" Explanations: Hinting at Causation Without Measuring It
Av ariation on the practice of describing association as causation is an approach in which the researcher hints, without actually measuring, that ap articular featureo fad rug or therapeutic class produces as pecific outcome. One study from the medical literaturee xamined the association between all-cause health services use and depression treatment consistent with "clinical guidelines" (from the Canadian Network for Mood and Anxiety Treatments [CANMAT])-including drug, dose, and duration. The recommended first-line drugs included all mechanisms of action: citalopram, fluoxetine, fluvoxamine, paroxetine, sertraline, buproprion, nefazadone, venlafaxine, moclobemide (monoamine oxidase inhibitor), and imipramine. 6 The authors concluded that greater guideline concordance was associated with increased visits to the prescribing physician, reduced inpatient admissions, and no significant differences in emergency room visits. Amazingly,n either medication side effects nor depressive symptoms werem easured by the study, but the authors, consultants to pharmaceutical manufacturers, attributed study results to the side-effect profiles of the first-line medications that "may be morefavourable than [those] of other antidepressants, which in turni ncreases patients' adherence to medication thereby allowing them to receive the full benefit of antidepressant therapy." 6 Since the purported causal mechanism (reduced side effects producing better adherence and increased resolution of depressive symptoms) was not investigated, the conclusion that outcomes werea ttributable to this mechanism was unfounded.
This example of attributing outcomes to unmeasured attributes is depicted in Figure1 .E ven without this obvious disconnect between cause and effect, the reader might be tipped to the flawed method by the inconsistency in the outcomes. Concordance with antidepressant "guidelines" was associated with reduced inpatient hospital use but not reduced emergency room visits.

Cruelty to Numbers in the Enchanted Forest of Statistics
"Torturen umbers," says writer Gregg Easterbrook, "and they'll confess to anything." 1 In claims database analyses Measurement and Causal Attribution in Antidepressant Guideline Concordance Study 6 employing retrospective designs, it is common for the groups being compared to differ in important ways. Covariates (predictive or explanatoryf actors) may be highly correlated with each other and with the study' so utcome measure(s). While these circumstances often requirer esearchers to use multivariate statistical techniques, doing so without examining and providing additional basic information creates a kind of "enchanted forest of statistics"; an answer emerges from within, but no one knows exactly how it came about. In the worst-case "enchanted forest" scenario, readers aretold that the analysis leads to ac ertain conclusion without being advised of the steps that led to that conclusion and have no way to interpret the practical importance of the findings. They may be left with the feeling that the numbers have been tortured, using incomprehensible and unexplained techniques, to produce the researchers' desired conclusion. An increasing number of "enchanted forest" examples are found in the burgeoning field of predictive modeling, developed to enable early identification and proactive intervention with high-risk patients. 8 Because most predictive models arep roprietary, their specific algorithms aret ypically unavailable in the published literature. 8,9 Yeta rticles touting the benefits of these tools, often claiming remarkably high success rates and improvements over standardand published multivariate techniques, are common. [10][11][12] One article from the published literatured escribed many tasks necessaryi na nalyzing medical data adequately (e.g., accounting for curvilinearity and skewness, categorical effects such as disease severity classification, and interaction effects) as outside the realm of standardt echniques and thereforer equiring the derivation of "a formula just like the regression technique, except that the formula is morecomplicated and mored ifficult to understand." 10 This assertion is questionable, since the effects discussed by the authors can be handled with techniques such as exponentiation, logarithmic transformation, dummy-variable coding, and interaction coding, which are interpretable using standardmultivariate textbook methods. The article went on to claim that 34% of the variance in per-member-per-month (PMPM) medical charges could be explained with the authors' proprietaryt ool, compared with about 10% to 15% for standardm ethods, without providing any details of the statistical analysis that produced this conclusion. 10 It is not surprising that the predictive modeling field is currently the target of concerns about "black box" techniques. 8 Concerns of this type ares ometimes expressed about epidemiological studies as well. For example, al etter to the Lancet complained about the misleading use of risk ratios in the medical literature. 13 The authors pointed out that changes in death rate from 2t o1p er 100 and from 2t o1p er 1,000 both represent relative risk reductions of 50%, yet clearly represent verydifferent actual risk levels. They used findings of the Women' sH ealth Initiative (WHI) study as an example.
Comparing ar egimen of estrogen plus progestin versus placebo, the WHI had documented ah azardr atio of about 1t o2 4 for invasive breast cancer,which one editorial had described as "a 24% increase in breast cancer risk." 14 But, as the authors of the Lancet letter pointed out, because the baseline risk of breast cancer was veryl ow,t he hazardr atio actually represented an incremental absolute risk of only about 0.49% over a5 -t o 6-year follow-up, or about 0.09% per year.

Hey,Rocky-Watch Me Pull aRabbit Out of My Hat!-Findings Without Foundation
Promised ar abbit, Rocky,t he cartoon squirrel, was invariably disillusioned when his moose friend, Bullwinkle, instead produced another creature, which seemed to come out of nowhere. When methodological decisions do not build logically on an existing body of knowledge, consumers of research information may feel the same way as Rocky did. One British Medical Journal reader characterized the potential problem as a" Texas sharp shooter" effect. 15 Just as an unscrupulous marksman might draw his target around the bullet holes that he already shot on the side of ab arn, researchers who choose methodology post hoc, instead of apriori, risk misleading their audience. Failureto demonstrate foundation and al ogical progression in claims database research gives the impression, whether real or only apparent, that the methods wered rawn around the desired findings.
Knowingly developing procedures so that they will produce ad esired outcome is obviously unethical, and the value of apriori decision making in producing valid research findings is clear.B ut the claims database researcher often encounters gray areas because of limitations of the claims themselves.

Sometimes Foundations Have to Shift (a Little): The Dilemma Posed by Claims Data Limitations
The claims database researcher who sets out to perform work based solely on ap riori decision making may quickly encounter at ough reality: when claims databases arel ess than ideally suited for the task at hand, revisions to even the most carefully selected ap riori techniques might become necessary. This situation arises mainly from 2root causes.
First is variable quality of the information in the diagnosis fields. While some studies have documented ah igh level of accuracy in diagnosis fields of administrative claims data for common medical conditions such as asthma, respiratory infections, urinarytract infections, and acute myocardial infarction, [16][17][18] it is clear that precision and straightforwardinterpretation arebynomeans guaranteed when using administrative claims as the data source. Even putting aside the possibility of misdiagnosis related to provider reimbursement levels, stigma, or diagnostic uncertainty, 19,20 the codes appearing in administrative data may be unreliable or systematically biased for accurately diagnosed patients. [21][22][23] In one study,36% of the potential study population of type 2 diabetic patients had to be excluded from analysis because they also had 2ormoreclaims for type 1diabetes in the same 1-year period. 23 Ac omparison of medical records to database entries for primaryc arev isits found that the administrative claims of accurately diagnosed patients contained amissing or incorrectly keyed primarydiagnosis 34% of the time; secondarydiagnoses wereeven less accurate. 22 As econd cause of revised ap riori methods is billing practices. Systematic coding issues can occur from the use of preprinted encounter forms, which typically precode the most common 20 to 30 diagnoses seen in the medical practice. 21 Because the top diagnoses coded by specialty provider offices tend to be concentrated in aparticular body system or medical condition, they areo ften coded morea ccurately and precisely than in primary-carep rovider or general-practitioner offices, wheret he encounter form codes must cover aw ider range of conditions. Similarly,d iagnoses tend to be morep recise in inpatient than in outpatient billing because of differences in setting, attention to reimbursement levels, and coder training. 24 Working with claims for specialty or injectable medications, an activity that is increasing because of greater use and cost for these drugs, can be particularly difficult. Because payers differ in billing requirements for biologic injectable medications, a multiple-payer dataset can contain different Health CareFinancing Administration (HCFA, now known as the Centers for Medi-care&M edicaid Services) HealthcareC ommon Procedural Coding System (HCPCS) codes for the same drug, as well as the same HCPCS code representing multiple drugs. For example, billing requirements posted in February2 006 for 1p ayer assigned the same nonspecific "J" (injectable") code (J3490) to adalimumab, efalizumab, exenatide, peginterferon alpha-2a, and peginterferon alpha-2b. 25 "Home-grown" (payer-specific) codes arealso possible.
For the claims database researcher,all this imprecision means that thereisoften morethan 1r easonable approach to methodological decisions, and it may take some trial and error to find the best approach. To ignoret his need invites mistakes, such as examining the wrong diagnostic codes or studying ap articular HCPCS code in the mistaken belief that it refers to aparticular medication when the provider actually used the code to represent ad ifferent drug. And when at rial and error process becomes necessary, failing to mention it to consumers of research information gives unwarranted credence to the quality of the data and ultimately to the study results. For those who believe that attention to the methodological and ethical hazards of claims database analysis represents esoteric fascination with picayune detail, the current status of chronic kidney disease (CKD) treatment provides sobering evidence to the contrary. As ar ecent Journal of the American Medical Association commentarydiscusses, many current CKD standards of careare based primarily on well-done observational analyses of epidemiological data. In light of new studies using stronger randomized designs, some treatment-to-outcome associations observed in older studies aren ow suspected to be due to unmeasured factors 26 -just like the association between household appliances and childhood academic achievement in the Coleman reanalysis. 4 The result is consternation and confusion as providers and promulgators of CKD carestandards trytoadjust to the newer and moreaccurate research information. 26 The moral of this cautionaryt ale is that claims database researchers risk promotion of suboptimal practice if they fail to measurea nd acknowledge the limitations of their work. Similarly,u nless consumers of information adopt a" caveat emptor" attitude, they risk being subject to the intentional or unintentional biases of researchers. Both researchers and consumers must learntorecognize and use "best practices." nn "Best Practice" in Claims Database Research: Knowing It When YouSee It "Best practice" work in claims database research is characterized by 4hallmark features. 1. It goes beyond meredocumentation that an association exists and instead investigates the specific naturea nd importance of that association. 2. It is utterly transparent about the number and type of analyses performed to reach study conclusions and reports completely all codes (diagnostic and procedural), details of the sample, and other information necessaryt oa llow the work to be replicated by others. 3. It translates output from statistical analyses into terms that areunderstandable and useful to readers. 4. It investigates and candidly acknowledges the potential effects of ambiguities and limitations inherent in claims database analysis. Following ares ome examples of "best practice" studies to illustrate the value of these approaches.

Study Overview
As tudy of the economic impact of anemia, conducted by Nissenson et al. and published in the Journal of Managed CareP harmacy (JMCP) 27 was similar in purpose to the aforementioned AFIB study 7 but distinguished by 2k ey methodological differences. First, its study population was limited to patients with diseases (CKD, human immunodeficiency virus, rheumatoid arthritis, inflammatoryb owel disease, congestive heart failure, solid-tumor cancers) that predisposed them to anemia; thus, both of the study' sc omparison groups had serious chronic illnesses. Second, in addition to calculating the payer' sall-cause health carecosts for anemic versus nonanemic patients (controlling for demographics, coverage type, disease category, CCI), Nissenson et al. took ac ritical additional step in conducting as eparate analysis of costs for medical services typically used in anemia management (e.g., transfusions, certain types of injections). They reported the percentage of patients using each anemia service for the sample overall and separately for each predisposing disease category. In ag reat example of transparency,t hey reported that services clearly attributable to anemia "accounted for only 5% to 11% of the cost differential between anemic and nonanemic patients." The remaining costs wereattributable to "services without an anemia diagnosis code or another unambiguous relationship to anemia." The authors candidly acknowledged 2p ossible explanations: (1) that the algorithm used to identify the anemia-related costs had failed to capturet he full economic impact of the disorder,o r( 2) that anemia was amarker for underlying disease severity; that is, the association between anemia and cost was not causal. 27

What Does Case Study #1 Show Us?
Appropriate caution in interpreting associations of the type observed in the AFIB study does not relegate hapless claims database researchers to pointing out cost differences, shrugging helplessly,and saying something like, "oh well, without medical records or arandomized trial there' sreally no way to tell." With reasonable effort and attention to good basic design, ac laims database researcher can provide information that is both useful and consistent with the analyses performed.
Nissenson et al. created relatively homogeneous study groups by first sampling patients with serious chronic diseases, then subsampling to create comparison groups of patients with anemia and without anemia. 27 The authors' basic design controlled for measurable differences between study groups, but they acknowledged the possibility that unmeasured factors could have affected their results. They supplemented their primarya nalyses with at horough investigation of anemiarelated services, providing information about patterns of treatment for their patient population of interest. They documented the percentage of total health carec osts clearly attributable to anemia. They reported completely all codes used in their analysis, enabling other researchers to replicate and perhaps improve on their methods. Finally,t hey werec andid about the limitations of their work.

Study Overview
As tudy of electronic prescribing conducted by McMullin et al. and published in JMCP examined the impact of ah and-held computerized decision support system that combined electronic prescribing capability with educational messages targeted to physicians. 28 Outcome measures werecosts PMPM and per new prescription. In addition to examining results in the aggregate (comparing patients of users with nonusers of the devices in ac ontrolled trial design), the study authors assessed the same utilization outcomes for the subset of 8t herapeutic categories most often targeted by the messaging. The study found that use of the targeted medications declined from 39.4% to 35.8% in the intervention group and increased from 40.1% to 43.4% in the control group.

What Does Case Study #2 Show Us?
Even as trong design does not always eliminate the need for further investigation. In this instance, study authors used post hoc analysis to investigate whether the prescribing system' s educational messaging or other factors (e.g., just using the device itself, or a"Hawthorne effect" of participation in the project) produced the results observed in the controlled trial. By assessing whether the "dose" (the messaging) was related to the "response" (prescription cost), the study provided actionable information about electronic prescribing education, rather than an unexplained (and thereforel ess informative) association between the hand-held device and utilization outcomes.

Study Overview
An assessment of the impact of prescription drug coverage on spending for hospital and physician services in senior populations was conducted by Briesacher et al. in the context of discussion about the MedicarePrescription Drug, Improvement, and Modernization Act of 2003. 29 This study' sp urpose was to examine the accuracy of perceptions that providing seniors with drug coverage would result in improved access to necessary medications and ultimately lead to medical cost offsets.
The primaryu tilization outcome measures weree xpenditures for physician and hospital services. Afi xed-effects panel model design was used to compares eniors who acquired drug coverage ("Gainers") with those who remained without coverage during the study period ("Nevers"). In addition to the study' s primaryanalysis of change in aggregate health carespending for Gainers versus Nevers, Briesacher et al. performed an analysis of changes in spending on prescription drugs beforea nd after the Gainers became eligible for drug coverage. The purpose of that analysis was to determine whether prescription drug use changed following the acquisition of coverage, thereby establishing "the mechanism by which drug coverage might influence medical carespending through increased use of medications." 27 Supplementing the basic panel model design, this secondary analysis was intended to provide an indication of whether any differences in medical expenditurepatterns between the 2study groups werea ctually due to the Gainers' acquisition of drug coverage and not to unmeasured differences between Gainers and Nevers. However,t he study found that acquisition of drug coverage increased prescription drug spending without any consistent effect on medical expenditure.

What Does Case Study #3 Show Us?
In retrospective observational analyses, even with astrong basic design that includes appropriate comparator groups and solid statistical controls, it is possible for unmeasured factors to influence study outcomes. The authors of study #3 recognized this problem and took the critical step of looking for evidence that the factor of interest actually produced study outcomes, instead of making assumptions about unmeasured causal effects.

Study Overview
In assessing the risk of cough with angiotensin-converting enzyme inhibitors (ACEIs) versus angiotensin receptor blockers (ARBs), the Agency for Health Research and Quality (AHRQ) applied an odds ratio derived from multivariate analyses (0.341 for odds of cough with ARBs compared with ACEIs) to abaseline cough rate of 8.9% for ACEIs based on clinical trials. The result (see Table 1), much moremeaningful in practical terms than an odds ratio, was anumber needed to treat (NNT): prevention of 1ACEI-attributable cough would requiretreatment of approximately 18 patients with ARBs. 30

What Does Case Study #4 Show Us?
In addition to thoroughly documenting the methods used in its analysis, the AHRQ took the critical next step of translating the statistical output, an odds ratio, into clinically and practically meaningful terms.

Study Overview: #5
In astudy of Helicobacter pylori eradication regimens, researchers determined that alarge number of patients filled prescriptions for antisecretorym edications befores tarting the H. pylori eradication regimen. Because it was impossible to determine from claims data alone whether the patients continued using the antisecretorym edications with the regimen, patients were classified into treatment categories both with and without the antisecretoryd rugs, and results using the 2m ethods were compared. 31 Both methods yielded the same results.

Study Overview: #6
In an analysis of statin use after myocardial infarction, researchers discovered that cause of death was not available for as mall number of cases in their administrative database. They calculated outcomes classifying the ambiguous cases in 3different ways: as missing values, as cardiac deaths, and as noncardiac deaths. 32 The results using all 3methods wereequivalent.

What Do Case Studies #5 and #6 Show Us?
The authors of both studies responded to unexpected circumstances by adjusting the original planned study methodology, comparing results using original and revised methods, and candidly reporting the results.

nn "Best Practice" Procedures
While it is impossible to enumerate hereall the techniques that might be applicable to agiven situation, thereare practices that, if used consistently,will help distinguish appropriate from inappropriate claims database analysis. Researchers can follow these practices to help them engage in appropriate and effective data mining instead of falling victim to the lureo ft he fishing and trapping expedition. Consumers of information should look for evidence that researchers have followed these practices, and view results with suspicion when they have not.

Adopt an Approriate Overall Approach
Researchers should understand and candidly acknowledge the limitations of claims databases and observational (rather than experimental) designs. Patients, health plan members, providers, and decision makers arepoorly served when authors make more of their results than is warranted.

Build aRational Basic Design
Researchers should begin the study design process with published information or other publicly available and accepted source(s) as the basis for initial methodological decision making. While designs employed in previous research (or logical variations thereof) areo ften ag ood starting point, various additional sources might be used. Classifications of drugs Editorial into comparison groups might be based on package labeling, as in 1s tudy of statin medications that stratified patients into "intensive" versus "standard" treatment based on whether product labeling indicated low-density lipoprotein cholesterol reduction of 40% or morefor the drug and dosage combination. 33 Classifications of patients might be based on statistical standards, as in astudy of users of insulin lisproversus regular insulin, in which patients wereg rouped into quintile propensity scoreb ins because of previous research documenting the effect of this stratification method on bias due to covariation. 34

Sail in Uncharted Waters When Necessary
Wheren os tandarde xists, it is also reasonable to state this and describe the rationale for the approach used. For example, the authors of astudy of hypoglycemic events and costs in patients with type 2d iabetes acknowledged al ack of consensus in the literaturew ith respect to washout times to define new starts; their choice of 4months was based both on the literatureand on known pharmacokinetics of drugs being studied. 35 Similarly,ina study of the economic impact of herpes zoster (HZ), researchers noted difficulties in unequivocally attributing medical services to HZ because patients often present with vague pain symptoms, such as chest pain of unspecified etiology,u pt oaf ew weeks befored iagnosis. Treating this problem as as tudy design issue, the authors included costs associated with these vague symptoms in their definition of HZ-related services, but compared patients diagnosed with HZ with ag roup of non-HZ patients, controlling for demographic and clinical factors. 36

Logically Connect Method to Objective
Researchers should measureo utcomes that arel ogically related to the phenomenon of interest. In ac ost of illness study,t he primaryo utcome will likely be disease-associated costs, but note that ad iagnosis for the disease need not be required. For example, as tudy of depression-related costs might include a separate assessment of services attributable to injuryoroverdose to account for the possibility that depressive symptoms would lead to self-injurious behavior.
To the extent that total costs arek nown to be affected by ap articular disease, those should be measured, but clinical reasonableness and common sense will often dictate the removal of some costs. For example, in as tudy of the economic effects of antihypertensive compliance, it is clearly inappropriate to include costs for appendectomies in the outcome measure.
Timing may be important in these interpretations as well. For example, in as tudy of statin use for primaryp revention, it would be inappropriate to attribute medical cost differences during the first month of treatment to better statin compliance.

Perform Sensitivity Analyses When Appropriate
Whatever methodological approaches aretaken, sensitivity analyses of the effects of methodological choices area lways help-ful and often essential. For example, as tudy of hypoglycemic events in diabetic patients measured akey variable, hemoglobin A1C value, using 3different methods (mean overall, last value, lowest value) and reported results using all 3methods. 35 Numerous approaches to sensitivity analyses arep ossible. The key to conducting effective sensitivity analyses is to base them on reasonable scenarios and to be completely transparent with readers about the number of analyses performed. Transparency is essential because increasing the number of analyses also increases the probability of Ty pe 1(false positive) error. 37

Understand What Associations Do and Do Not Indicate
In interpreting associations between outcomes and other events or factors (e.g., benefit design features, medical condition, treatment), ag ood rule of thumb is that if an explanation for study findings is worth mentioning, it is worth investigating. If patients taking drug Ahave lower health carecosts than patients taking drug B, astatement that this patternisdue to better medication compliance for drug Ashould be supported by evidence that (1) drug A' sc ompliance is better than drug B' s, and (2) better compliance is linked to lower health carecosts.
Explorations of the process underlying an association do not replace the primaryanalyses documenting the association; they simply exploret he association in sufficient detail so that conclusions ares upportable. In technical terms, these explorations provide "construct validity," that is, by assessing the mode of action underlying the outcome, they help document whether the study measures actually represent what the study authors believe they represent. 38 For example, in the McMullin et al. study of the electronic prescribing system, the separate assessment of targeted drug categories provided evidence that differences between the study groups (users versus nonusers of the system) represented the effect of the system' se ducational messaging (i.e., what the authors weret rying to test), not only the effect of having the device or participating in the trial.
Some creativity is often necessaryi nd evising methods to appropriately investigate associations and to measuret he suspected causal relationships. For example, Figure1d epicts the hinted conclusion in the aforementioned study of antidepressant "guidelines," 6 while Figure2d epicts an alternative approach that includes measurement and testing of the suspected causal mechanism. Note that the authors of the antidepressant study did not classify medications by side-effect profiles other than to hint that CANMAT' sverydiverse group of "first-line" drugs had moref avorable side-effect profiles than other treatment choices had. However,i na na ctual test of the authors' hinted explanation, classification of drugs by side-effect profile would be an ecessaryfi rst step. Thus, in the alternative design, patients taking antidepressants with favorable and with less favorable side-effect profiles arec ontrasted. Patients taking benzodiazepines (the therapeutic class used by the majority of patients not receiving antidepressants in that study) serve as an additional comparison group. Level of adherence to medication is amediating (interim) outcome measurefor the 2antidepressant drug groups. For the final outcome measures, the analysis of all-cause costs is supplemented with ameasuremorelogically related to the subject of the study-cost to treat depression and related conditions.
In most situations, the claims data ares till available to the researchers after the initial analyses have been completed, making investigation of possible causal mechanisms feasible. Irrespective of whether the claims data area vailable, evidence from the research literature( e.g., similar studies conducted on other patient populations) should be assessed and, if appropriate, used as 1c omponent of the explanation of process. While it is unreasonable to expect investigation of everyp ossible explanation, no matter how far-fetched, ar easonable expenditureo ft ime to investigate major study findings should be expected of claims database researchers.

Use Disclosure Not Data Torture
The analytic process (and the data presentation) should begin with basic descriptive measures (e.g., percentage, mean, median, and ameasureofdispersion such as range or standard deviation) for the study sample overall and for key subgroups. When used at the outset of the analytic process, basic descriptive information helps the researcher to select astatistical technique that is appropriate for the data. 39 Additionally,r eaders can use this information to comparet he study sample to their own populations on key dimensions (e.g., age, gender,c omorbidities, benefit design) and determine the degree to which study results apply to them; that is, the "external validity" (pragmatic applicability) of the work outside its original setting. 38 Readers who areu nfamiliar with the statistical techniques employed can better grasp the gist of the study findings from descriptive information (e.g., cost increased from $1.00 to $1.10 (10%), compliance rates doubled from xt oy ,e tc.). Importantly,i fd escriptive and multivariate results differ, the reader should be told why (e.g., number of household appliances was correlated with academic achievement until we controlled for parental income, suggesting that parental income may influence both the purchase of appliances and the academic achievement of children).
Data presentation of statistical equations (e.g., regression, general linear model) should also include disclosureo ft he number of cases used in the equation (and how cases were selected for inclusion), discussion of the overall quality of the equation (e.g., percentage of variance explained, goodness of fit, accuracy of prediction), and an explanation of the substantive meaning of multivariate coefficients.
It is helpful to put results into terms that demonstrate the value (or lack of value) of outcomes to the reader,asthe AHRQ did when it translated an odds ratio into an NNT when comparing ACEIs with ARBs regarding drug-induced cough. 30 In addition to NNT,helpful expressions include absolute risk reduction and number needed to harm. With proper disclosure, each expression may be calculated from one of the others. 40 Additional recommendations for full disclosurea ppear in the JMCP author guidelines and include discussion of power calculations performed and of the numbers of events observed and numbers at risk for each outcome, and presentation of P values and 95% confidence intervals around findings. 41

Thoroughly Report Diagnosis, Procedure, and Drug Codes
To increase both the usefulness of the research and the reader' s confidence in its results, complete reporting of diagnostic, procedural, and prescription drug codes used in all phases of the research, including identification and classification of the study sample and calculation of all outcomes, is essential. The time period(s) for measurement of all codes should be reported clearly as well. Drug coding should address drugs dispensed in community and mail pharmacies and, if applicable, drugs administered in physician offices (e.g., injectables). If study results could be affected by incomplete or ambiguous information, for example, the lack of specific drug information for hospital stays or the difficulties in identifying newer injectable medications using HCPCS codes, this problem should be candidly reported and addressed either by sensitivity analyses or design modification(s).

nn Detecting and Managing Problems
Despite the best efforts of researchers to base their results on published criteria, previous work, or reasonable decision rules, problems commonly arise in claims database analyses, particularly when studying relatively new treatments or complex outcomes. Several approaches areh elpful in detecting these shortcomings. To guarda gainst engaging in an analytic

Verify Exclusion and Inclusion Criteria
First is an examination of cases or the individual claims excluded from astudy sample or outcome measure. This examination provides some assurance that ac ode relevant to the outcome of interest, but used by providers in an unanticipated manner, is not overlooked. For example, if one of the outcome measures is hospitalization for aparticular condition, it is helpful to run af requency of diagnosis and procedurec odes for hospital stays initially classified as being for other conditions. This step is particularly important when information about the payer' s reimbursement practices (e.g., use of "home-grown" codes or reimbursement limits for certain diagnoses) is unavailable or limited.

Account for Date-of-Service Ambiguities
The use of reasonable time windows can help account for uncertainty in claim dates of service due to common billing practices. For example, in as tudy of practice patterns in management of upper gastrointestinal symptoms, researchers assessed the effect of moving dates of diagnostic tests backwards (earlier) by 2days to allow for situations in which the physician received laboratorytest results by telephone beforethe claim date. 42

Consult an Independent Third Party
Ac onsultation with 1o rm orec olleagues may become necessaryi ns ituations in which am ethodological decision makes ad ifference in the study results and therei sn od efinitive information available to guide the decision. To avoid "fishing" or "Texas sharp shooting," it is important to keep the colleague uninformed about the implications of his/her judgment for study findings. Areasonable approach is to explain the purpose of the study,p resent the methodological choices, and briefly review the methodological rationale underlying each one, without advising the colleague of the results obtained using each method. However,s uch ac onsultation should not be used as an excuse for failing to sharei nformation with readers; situations of this type should be documented as part of any presentations or papers written about the study.

Review Individual Claims for aValidity Check
Finally,a fter completing an administrative claims analysis according to ap redetermined plan, it is wise practice to take time to look at the claims histories for asample of patients. Did the methodology seem to classify the patients appropriately? Were any important details missed? It is not uncommon to discover when looking at claims for as ample of patients that as pecification was missed in translating the study design into codes on the claims. For example, if patients arec lassified as having ah ospitalization for ap articular condition (e.g., ulcer disease), it is informative to look at as ample of the patients with hospitalizations, carefully examining the diagnoses and procedurecodes for evidence that the classification makes sense (e.g., for procedures like endoscopies or other gastrointestinal procedures).

nn Finding Your WayThrough the Enchanted Forest: The Role of Reviewers and Readers
The procedures and approaches described above should be routine in research with administrative claims. Researchers should be truthful and transparent in presenting methods and results and let readers decide for themselves if they agree with the interpretations. This suggestion is not novel, of course, but it is becoming increasingly important as multivariate statistical techniques become increasingly complex and esoteric and competition for scarce health careresources intensifies.
What do we at JMCP look for in claims database research? What should readers look for? Asummarychecklist of "best practices" is in Table 2. "Danger signs" that these practices werenot followed include the following: 1. Methodological decisions areu nexplained or inconsistent with previously published research without an identifiable reason or explanation by the authors. 2. Moret han 1a pproach to ak ey methodological decision Editorial

Checklist of Best Practices for Claims Database Research
Associations recommendations is clear.

Data Analysis and Data Presentation
standards described in JMCP understandable to readers.

Research Process
outcomes arereported in complete detail.